Supplemental Material Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
نویسندگان
چکیده
We present further empirical and qualitative results for both image and video description. For the image description task, we explore averaging weight vectors before transfer, illustrate errors made by the model when no unpaired text data is used during training and provide descriptions generated by DCC for a large variety of novel object categories in ImageNet. For the video description task, we include results when training the language model with an external text corpora and more DCC descriptions of novel objects in video.
منابع مشابه
Deep Matching Autoencoders
Increasingly many real world tasks involve data in multiple modalities or views. This has motivated the development of many effective algorithms for learning a common latent space to relate multiple domains. However, most existing cross-view learning algorithms assume access to paired data for training. Their applicability is thus limited as the paired data assumption is often violated in pract...
متن کاملDeep Interactive Region Segmentation and Captioning
With recent innovations in dense image captioning, it is now possible to describe every object of the scene with a caption while objects are determined by bounding boxes. However, interpretation of such an output is not trivial due to the existence of many overlapping bounding boxes. Furthermore, in current captioning frameworks, the user is not able to involve personal preferences to exclude o...
متن کاملAutomated Image Captioning Using Nearest-Neighbors Approach Driven by Top-Object Detections
The significant performance gains in deep learning coupled with the exponential growth of image and video data on the Internet have resulted in the recent emergence of automated image captioning systems. Two broad paradigms have emerged in automated image captioning, i.e., generative model-based approaches and retrieval-based approaches. Although generative model-based approaches that use the r...
متن کاملLarity Perception of Novel Objects
We develop a model of perceptual similarity judgment based on re-training a deep convolution neural network (DCNN) that learns to associate different views of each 3D object to capture the notion of object persistence and continuity in our visual experience. The re-training process effectively performs distance metric learning under the object persistency constraints, to modify the view-manifol...
متن کاملSeeing with Humans: Gaze-Assisted Neural Image Captioning
Gaze reflects how humans process visual scenes and is therefore increasingly used in computer vision systems. Previous works demonstrated the potential of gaze for object-centric tasks, such as object localization and recognition, but it remains unclear if gaze can also be beneficial for scene-centric tasks, such as image captioning. We present a new perspective on gaze-assisted image captionin...
متن کامل